I analyzed trees in London, UK by comparing the location of the trees that are in the different boroughs of Greater London. This is useful information for a variety of people to know if there is a lack of trees in any given area. Then with this information there can be an effort to plant more trees in places with less of them. Trees are important in an area’s quality of life as they filter the nearby air and water, including removing carbon, and act as cooling agents.
The dataset of tree locations was found on the London Datastore (https://data.london.gov.uk/dataset/local-authority-maintained-trees) and was collected by the Greater London Authory.
The shapefile of the boroughs in Greater London was found on the European Commission’s website (https://data.europa.eu/data/datasets/statistical-gis-boundary-files-for-london/?locale=en).
To prepare my data for the analysis, I first added geometry to the tree locations using the longitude and latitude that had been provided. Then I transformed the CRS to be 27700, which is the local CRS for London that the borough data is already in.
## Make trees numeric
num.trees <- tree.loc %>%
mutate_at(12:13, as.numeric)
## One point got messed up and will skew the data (51.585, 20180214)
## Remove this point from the data
cl.trees <- num.trees %>%
filter(!row_number() %in% c("764094"))
## Get rid of NAs and blanks in the data
cn.trees <- cl.trees[,c("lon","lat","borough")] %>%
na_if("") %>% # recode empty strings "" by NAs
na.omit() # remove NAs
## Make trees spatial
trees_sf <- cn.trees %>%
st_as_sf(coords = c("lon","lat"), remove = FALSE)
## Add CRS
st_crs(trees_sf) = 4326
## Transform Tree Location CRS to OSGB36 (ESPG = 27700)
trees_sf = st_transform(trees_sf, 27700)
## Make shapefile valid
boroughs <- st_make_valid(boroughs)
## Remove extra columns in boroughs
cl.boroughs <- boroughs %>%
select(!c("HECTARES","NONLD_AREA","ONS_INNER","SUB_2009","SUB_2006"))
Looking at the table below, it is already apparent that there is a large range between the number of trees that are in each borough as the maximum is 77,589 and the minimum is 252. The median is also smaller than the mean which tells me that the data is skewed to the right, with the majority of the boroughs having a smaller amount of trees.
| Metric | Value | |
|---|---|---|
| Sum | Sum | 875,127 |
| Mean | Mean | 24,309.08 |
| Median | Median | 19,583 |
| Minimum | Minimum | 252 |
| Maximum | Maximum | 77,589 |
This barchart gives a visualization of the amount of trees in each of the boroughs with two of the boroughs having over 75,000 trees, but the majority of the boroughs fall below having 40,000 trees. It reiterates the how the data is skewed towards lower amount of trees in each borough.
The choropleth shows that there are more trees reported in the northern part of Greater London and less trees reported in the southern part of Greater London.
## tmap mode set to interactive viewing
Moran’s I is 0.049. The p-value is 0.00175.
## Use Moran's I to analyze whether the data is more clustered or dispersed.
# Create Queen case neighbors
tr.bo_queen <- poly2nb(tr.bo_sf,
queen = TRUE)
# Convert neighbors to weight matrix
tr.bo_weights <- nb2listw(tr.bo_queen,
style = "B", # B is binary (1,0)
zero.policy = TRUE) # zero.policy allows for observations with NO neighbors
# Moran's I
tr.bo.moran <- moran.test(tr.bo_sf$n, # The column in your sp data
tr.bo_weights, # Weights object
zero.policy = TRUE, # Allows for observations with NO neighbors
randomisation = TRUE) # Compares to randomized NULL data
# Print Moran's I
tr.bo.moran
##
## Moran I test under randomisation
##
## data: tr.bo_sf$n
## weights: tr.bo_weights
##
## Moran I statistic standard deviate = 2.9201, p-value = 0.00175
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.0490954552 -0.0058479532 0.0003540275
The LISA Map shows one cluster with high values and the neighbors also having high values in the north of Greater London, and a smaller area with low values where neighbors also had low values.
The summary report of the LISA states that there are a couple areas of low values surrounded by high valued neighbors and high values surrounded by low value neighbors, but the map does not show these areas so they are not strong enough to be relevant.
##
## 0 HH HL LH LL
## 151 6 2 5 8
Moran’s I, 0.049, is closer to 1 than -1 so it shows very slight indication of clustering but it is barely above 0 so it indicates the data is random. And the p-value, 0.00175, is smaller than the alpha level of 0.05 which indicates that Moran’s I is statistically significant. The data is more random and there is no consistent pattern of clustering or dispersing.
The choropleth LISA map show a cluster of a large amount of trees in the north of Greater London, while there is a lack of trees near the City of London. The cluster of low values at the City of London makes sense because it has less green space. Currently I do not know why there is a cluster of high values near Enfield or Waltham Forest (other than the name including Forest), but further research could be done to find out the reason behind this, perhaps these areas are more residential or have wealthier inhabitants.
The primary limitation to this data that there is a lack of consistency among the data of tree locations provided. Many of the tree locations are from 2021, but some boroughs had not submitted updated numbers so they are from 2018 or 2015/2016.
The data analyzed shows that there is not a pattern of clustering or dispersing of the total amount of tree locations in each borough. To me this shows that a few boroughs have more trees while others are lacking trees. Further research can be done on whether or not the lack of trees in these other boroughs has any affect on their quality of living, an example being whether the lack of trees increases the amount of heat felt in the summer.